Semantic Naïve Bayes Classifier for Document Classification

نویسندگان

  • How Jing
  • Yu Tsao
  • Kuan-Yu Chen
  • Hsin-Min Wang
چکیده

In this paper, we propose a semantic naïve Bayes classifier (SNBC) to improve the conventional naïve Bayes classifier (NBC) by incorporating “document-level” semantic information for document classification (DC). To capture the semantic information from each document, we develop semantic feature extraction and modeling algorithms. For semantic feature extraction, we first apply a log-Bilinear document modeling (LBDM) algorithm to transform each word into a semantic vector, and then apply principal component analysis (PCA) to the semantic space formed by the word vectors to extract a set of semantic features for each document. For semantic modeling, a semantic model is constructed using the semantic features of the training documents. In the testing phase, SNBC systematically integrates the semantic model and the conventional NBC to perform DC. The results of experiments on the 20 News-groups and WebKB datasets confirm that, with the semantic score, SNBC consistently outperforms NBC with various language modeling approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification Using Naïve Bayes- a Survey

Classification, particularly Text Classification, is a supervised learning approach categorizing into various categories, the available training set of correctly identified observations analyzed into a set of features. There are many phases involved in classification. The main classification phase involves the use of classification algorithms or classifiers. Among the various classifiers, the N...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Improving on the Naïve Bayes Document Classifier

The Naïve Bayes document classifier has been used in many document classification algorithms [1], but is only really useful on a small subset of documents due to it’s many shortcomings [2]. By augmenting the basic functionality of the simple Naïve Bayes classifier, the classification algorithm can be applied to a much wider range of documents. This paper investigates the advantages which can be...

متن کامل

Field Association words with Naive Bayes Classifier based Arabic document classification

Document classification aims to assign a document to one or more categories based on its contents. This paper suggests the use of Field association (FA) words algorithm with Naïve Bayes Classifier to the problem of document categorization of Arabic language Our experimental study shows that using FA algorithm with Naïve Bayes (NB) Classifier gives the ~ 79% average accuracy and, using compound ...

متن کامل

Is Naïve Bayes a Good Classifier for Document Classification?

Document classification is a growing interest in the research of text mining. Correctly identifying the documents into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the existing classifying approaches, Naïve Bayes is potentially good at serving as a document classification model due to its simplicity. The aim of this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013